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Executive  Summary 


Computer  capacity  was,  and  marginally  still  is,  a  iimiting  factor  for  the  deveiopment 
of  basin-wide,  eddy-resolving  ocean  circulation  models.  For  example,  the  North  Atlantic 
Ocean  configuration  should  extend  from  the  American  (85W)  to  European/African  (OE) 
coasts,  from  20S  (to  include  the  Tropical  Convergence  Zone)  to  the  Fram  Straits  (75N). 
A  1  /a  degree  resolution  (the  average  grid  spacing  for  a  minimum  parameterization  of 
mesoscale  variability)  requires  a  horizontal  mesh  of  about  (344  x  384)  points.  To  re¬ 
produce  the  vertical  structure  of  the  ocean  dynamics  and  include  the  interactions  of  the 
several  water  masses  that  are  present  in  the  basin,  the  vertical  resolution  should  include 
at  least  15-20  collocation  points.  This  implies  a  grid  mesh  of  about  2x10  points,  with  a 
heavy  burden  on  computer  memory  and  computation  requirements.  Parallel  processing 
may  alleviate  this  problem.  In  view  of  basin-wide  applications,  it  makes  sense  to  divide 
the  data  matrix  for  the  basin  into  subdomains  and  to  allocate  one  CPU  for  each  subdi¬ 
vision  of  the  original  matrix.  With  this  allocation,  each  processor  updates  its  matrix  and 
then  propagates  the  boundary  values  to  its  neighbors. 

In  order  to  provide  an  adequate  representation  of  the  mesoscale  features,  an  alterna¬ 
tive  is  to  develop  a  modeling  system  that  connects  a  large-scale,  basin-wide  model  with 
high-resolution,  regional  models  concentrated  in  selected  areas  of  the  domain.  However, 
a  few  unsolved  questions  are  associated  with  such  an  approach.  First  of  all,  it  is  nec¬ 
essary  to  implement  nesting  algorithms  that  ensure  a  correct  transfer  of  energy  between 
the  coarse  and  fine  grids.  Moreover,  it  is  necessary  to  develop  appropriate  computa¬ 
tional  techniques  that  keep  a  continuous  exchange  of  information  as  the  computations 
proceed.  Again,  parallel  processing  can  provide  such  a  tool:  two  distinct  networks  of  pro¬ 
cessors  perform  the  coarse  and  fine  grid  computations,  separately:  the  updated  interface 
variables  are  communicated  to  a  central  server  that  applies  the  nesting  algorithms  and 
transmits  the  new  data  back  to  the  networks. 

This  document  introduces  a  new  parallel  processing  software  system  and  evaluates 
and  discusses  its  feasibility  to  ocean  circulation  modeling.  Applications  will  be  presented 
in  a  following  document. 


Abstract 


Glenda  is  an  environment  for  parallel  processing  which  is  modeled  after  the  Linda 
language  and  utilizes  the  PVM  software  system  to  provide  underlying  communications. 
The  resulting  software  maintains  the  friendly  parallel  programming,  typical  of  Linda,  and 

the  PVM  efficiency  in  message-passing  operations. 

This  document  describes  the  functions,  data  structure,  and  algorithms  ol  the  Glenda 


software  architecture. 


1  Introduction 


Glenda  is  a  parallel  programming  system  modeled  after  the  Linda  system  which  is  a 
easily-learned  parallel  programming  environment  (Carriero  Gelernter,  1990).  Glenda  is 
implemented  using  the  PVM  (Parallel  Virtual  Machine)  for  communications. 

PVM,  developed  by  Oak  Ridge  National  Laboratory  (Beguelin  et  al.,  1991),  is  a  soft¬ 
ware  system  that  allows  the  creation  of  and  access  to  a  concurrent  computing  system 
made  from  networks  of  loosely-coupled  processing  elements.  The  hardware  collected 
into  a  user  machine  may  be  single-processor  workstations,  vector  machines,  parallel  su¬ 
percomputers,  or  any  mixture  of  the  above. 

To  be  as  general  and  flexible  as  possible,  PVM  is  based  on  a  parallel  message¬ 
passing  model.  That  is,  the  programmer  must  pack  each  item  of  a  message  into  a  mes¬ 
sage  buffer  prior  to  sending  it  and,  similarly,  unpack  the  message  components  upon  re¬ 
ceiving  a  message. 

The  major  attributes  of  PVM  are: 

•  Can  use  a  workstation  network  and/or  a  multi-CPU  system 

•  Architectures  can  be  mixed  on  a  virtual  machine 

•  Most  popular  architectures  are  supported 

•  Asynchronous  message  passing  model 

•  Messages  consist  of  multiple  components  of  eight  different  data  types 

•  Messages  are  accessed  by  an  integer  message  type 

•  Supports  barriers  and  signals 

•  Supports  Fortran  and  C 

•  Consists  of  53  functions  for  C  and  36  for  Fortran 
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Linda,  a  registered  trademark  of  Scientific  Computing  Associates,  is  a  process  con- 
troi  model  which  is  based  on  a  global  storage  system  called  tuple  space.  Processes 
may  in/output  tuples  of  various  lengths  into  the  tuple  space,  using  pattern  matching  on 
tuple  contents.  This  can  be  applied  to  model  both  passing  and  broadcasting  directed 

messages. 

Linda  tuples  are  manipulated  using  six  basic  operations,  each  of  which  either  pro¬ 
duces  or  consumes  a  tuple,  viz.,: 

•  out  produces  a  tuple 

•  in  consumes  a  tuple 

•  inp  consumes  a  tuple  if  available 

•  rd  reads  a  tuple 

•  rdp  reads  a  tuple  if  available 

•  eval  executes  a  function  in  a  subprocess  producing  a  tuple 

When  using  the  Linda  model  and  agenda  parallelism,  worker  processes  are  all  equally 
capable  and  retrieve  tasks  from  an  agenda  until  all  work  is  done.  One  master  process 
starts  worker  processes  using  the  eval  function.  The  master  then  sends  tasks  using 
some  agenda.  Each  worker  program  consists  of  a  loop  which  retrieves  tasks  from  the 
agenda  and  sends  results  back  to  the  master  process.  When  all  the  data  have  been 
received,  the  master  process  sends  tasks  with  with  recognizable  illegal  requests  termed 
“poison  pills  to  the  worker  processes  to  terminate  their  executions. 

Although  the  six  Linda  operations  are  adequate  for  performing  parallel  programming 
and  are  simpler  to  use  than  the  analogous  PVM  functions,  there  are  some  significant 
problems  to  overcome  for  making  Linda  as  efficient  as  PVM  in  a  message-passing  envi¬ 
ronment.  The  greatest  difficulty  is  to  avoid  excessive  communications. 

Glenda  was  developed  to  provide  the  portability  and  efficiency  of  PVM  with  the  ease- 
of-use  of  Linda  (Seyfarth  et  al.,  1994).  Glenda’s  primary  goals  are: 
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•  Preserve  Linda’s  global  tuple  space  model 

•  Reduce  the  number  of  functions  required  by  PVM 

•  Maintain  PVM’s  message  passing  efficiency  whenever  required 

•  Maintain  PVM’s  portability 

•  Present  Linda  as  an  integral  part  of  the  host  language. 

Glenda  applies  the  PVM  system  to  perform  communications  and  manages  the  global 
tuple  space  in  a  manner  similar  to  that  used  with  the  Linda  language.  If  the  global  tuple 
space  Is  assigned  to  a  single  processor,  applications  might  possibly  experience  excessive 
network  traffic.  To  avoid  the  problem,  operations  have  been  implemented  that  provide 
direct  tuple  operations  among  the  Glenda  processes. 

2  Glenda  Primary  Functions 

2.1  Glenda  Task  Control  Functions 

There  are  three  operations  that  control  the  task  activities: 

•  tid  =  gLmytidO 

The  operation  starts  Glenda’s  activities  and  returns  an  integer  which  identifies  the 
calling  task  in  the  following  Glenda  and  PVM  function  calls.  A  process  must  call 
gl  jnytid  before  any  other  Glenda  operations. 

•  tid  =  gl_spawn  ( name  [, host]) 

A  new  task  is  started  by  a  call  to  gl.spawn.  The  name  parameter  is  the  name 
of  the  executable  file  for  the  new  task.  The  host  parameter  allows  the  optional 
designation  of  a  specific  computer  to  execute  the  program.  The  return  value  is  a 
PVM  task  id.  The  newly-spawned  task  must  call  gljnytid  before  executing 
any  other  Glenda  functions. 
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•  gLexit 

The  purpose  is  to  perform  any  required  cieanup  before  the  program  exits  and  to 
inform  the  tuple  server  that  the  task  has  been  completed.  In  turn,  gl  _exit  calls 
pvtit-exit  to  inform  PVM  also  of  the  task  completion.  However,  gl_exi't 
does  not  call  exit  and,  consequently,  the  program  continues  execution. 

2.2  Glenda  Global  Tuple  Operations 

There  are  five  Glenda  operations  which  manipulate  tuples  in  the  global  tuple  space. 
These  are  named  and  modeled  after  the  related  Linda  operations,  viz.: 

•  gl_out  (name,...)  places  a  tuple  into  the  the  global  tuple  space 

•  gl_in  (  name , . . . )  gets  a  tuple  from  the  global  tuple  space 

•  gl_inp(name,...)  is  a  predicate  version  of  gl -in.  It  returns  1  if  a  match¬ 
ing  tuple  exists  and  it  gets  the  data  for  the  tuple.  It  returns  0  if  no  matching  tuple 
exists. 

•  gl_rd(name,...)  is  similar  to  gl_iii.  The  difference  is  that  it  retrieves  a 
copy  of  the  matching  tuple  data  without  removing  the  tuple  from  the  global  tuple 
space. 

•  gl_rdp(name,...)  is  the  predicate  version  of  gl_rd. 

All  of  these  operations  syntactically  resemble  C  function  calls  with  a  variable  number 
of  parameters.  However,  there  are  a  few  differences.  In  the  Glenda  language,  the  first 
parameter  for  the  calls  is  always  the  name  of  the  tuple.  This  is  either  a  C  string  constant 
or  a  C  array  with  a  NULL-terminated  string.  The  name  is  used  by  the  tuple  server  to 
rapidly  identify  the  tuples.  The  remaining  parameters  are  components  of  a  tuple  and 
may  be  C  constants,  scalar  variables  or  arrays  of  the  basic  C  data  types.  In  read  and 
input  operations,  a  parameter  may  be  preceded  by  a  question  mark,  indicating  that  the 
following  variable  has  to  be  replaced  by  the  value  of  the  matching  tuple.  If  the  variable  is 
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not  preceded  by  a  question  mark,  the  operation  must  find  a  tupie  with  the  same  value  of 
the  parameter. 

Array  components  of  a  tuple  may  be  followed  by  two  optional  fields  that  define  the 
length  and  stride  of  the  array.  When  specified,  the  array  length  value  is  separated  from 
the  array  name  by  a  colon.  The  specification  is  optional  if  the  length  is  known  to  the  C- 
Glenda  preprocessor.  In  gl_out,  the  length  specifies  how  many  array  elements  are 
to  be  copied  into  the  tuple.  For  input  and  read  operations,  the  length  field  is  an  integer 
variable  indicating  the  length  of  the  array  to  be  received. 

After  the  array  length,  the  array  stride  parameter  may  be  specified.  It  represents 
the  increment  for  copying  the  array  elements  into  the  tuple.  The  stride  value  is  passed 
unaltered  into  the  appropriate  PVM  functions. 

2.3  Glenda  Directed  Tuple  Operations 

In  order  to  improve  the  overall  performance,  Glenda  offers  two  operations  that  send  and 
receive  tuples  without  passing  through  the  tuple  server.  These  two  functions  are: 

•  gl_outto(tid,  name,..)  sends  a  tuple  to  one  or  more  tasks-. 

•  gl_into(name,..)  receives  a  tuple  sent  directly  to  the  task. 

If  the  first  parameter  of  gl-OUtt  o  is  a  single  integer  variable,  the  tuple  is  directed  to 
the  task  identified  by  this  id  number.  If  the  first  parameter  is  an  integer  array,  it  identifies 
a  collection  of  tasks  to  receive  the  tuple.  Since  gl-OUtto  sends  tuples  directly  from 
task  to  task,  it  is  necessary  to  use  gl.int  o  for  receiving  these  tuples.  Matching  within 
gl_into  is  similar  to  the  other  input  operations,  except  that  tuples  that  do  not  match, 
are  saved  within  the  task  instead  of  within  the  tuple  server. 

2.4  Examples 

2.4.1  gl-out  examples 

•  gl-OUt  (  "data",  i,  k,  value  ) 

Value  can  be  a  scalar  or  an  array.  If  it  is  an  array,  the  length  is  implicit. 


•  gl_out  (  "row",  i,  x:len  ) 

Here,  the  length  to  output  for  x  is  given.  X  could  be  an  array  or  a  pointer. 

•  gl-out  (  "column",  j,  x[j]:len  ) 

Here,  we  must  specify  the  length  to  output.  The  preprocessor  only  keeps  lengths 
for  single-dimension  arrays. 

2.4.2  glJn  and  glinp  examples 

•  gl_in  (  "data",  i,  k,  ?  value  ) 
gl_inp  (  "data",  i,  k,  ?  value  ) 

These  match  on  “data”,  i  and  k  returning  data  for  value.  Value  can  be  a  scalar  or 
an  array.  If  it  is  an  array,  the  length  is  implicit. 

•  gl_in  (  "row",  i,  ?  xilen  ) 
gl_inp  (  "row",  i,  ?  xtlen  ) 

These  match  on  “row”  and  i  returning  data  for  the  array  x  and  returning  the  number 
of  items  as  len. 

•  gl_in  (  "column",  j,  ?  x[j]  ) 
gl_inp  (  "column",  j,  ?  x[j]  ) 

Assuming  that  x  is  a  two-dimensional  array,  these  match  on  “column”  and  j,  return¬ 
ing  data  into  xO].  The  length  is  ignored. 

2.4.3  gl_rd  and  gl-rdp  examples 

•  gl_rd  (  "data",  i,  k,  ?  value  ) 
gljrdp  (  "data",  i,  k,  ?  value  ) 

These  match  on  “data”,  I  and  k  returning  data  for  value.  Value  can  be  a  scalar  or 
an  array.  If  it  is  an  array,  the  length  is  implicit. 

•  gl_rd  (  "row",  i,  ?  x:len  ) 

gljrdp  (  "row",  i,  ?  xrlen  ) 
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These  match  on  “row”  and  i  returning  data  for  the  array  x  and  returning  the  number 
of  items  as  len. 

2.4.4  gLoutto  and  glinto  examples 

•  gl_oTitto  (  tid,  "data",  i,  k,  value  ) 
gl_into  (  "data",  i,  k,  ?  value  ) 

In  this  example,  the  PVM  task  id  number  tid  is  used  to  output  a  tuple  containing 
“data”,  i,  k,  and  value.  Value  can  be  a  scalar  or  an  array.  The  corresponding  gIJnto 
matches  on  “data”,  i,  and  k  returning  the  data  for  value. 

•  gl.outto  (  tid  :  len,  "data",  i,  k,  value  ) 
gl_into  (  "data",  i,  k,  ?  value  ) 

This  example  is  the  same  as  the  one  above;  but  instead  of  being  sent  to  only  one 
process,  the  tuple  is  sent  to  all  of  the  processes  whose  PVM  task  id  numbers  are 
specified  in  the  array  tid  of  dimension  len. 

3  Glenda  Support  Functions 

For  a  variety  of  reasons,  it  might  be  desirable  to  bypass  the  Glenda  tuple  matching  op¬ 
erations.  The  C-Glenda  preprocessor  provides  five  operations  which  pack  a  collection  of 
PVM  function  calls  into  a  send/receive  facility.  The  translated  code  makes  use  of  the  func¬ 
tions:  pvm_initsend ,  pvm.send,  and  pvm_mcast.  The  support  operations 
are: 

•  gl_send(tid,  msgtag,..*)  sends  the  package 

•  gl_recv(tid,msgtagvO  receives  the  package 

•  gl_wait(tid,msgtag,.-)  receives  a  message  and  may  wait  for  more 
messages. 

•  gLpack  packs  the  message 
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•  gLunpack  unpacks  the  message. 

In  the  function  gl.send  the  first  parameter  can  be  a  singie  value  or  an  array  of 
PVM  task  id  numbers,  as  in  gl.outto.  The  second  parameter,  an  integer,  represents 
the  PVM  message  type.  The  remaining  parameters  are  treated  iike  the  components  of  a 
tuple. 

Both  gl_recv  and  gl_wait  receive  a  message.  The  operation  gl_recv  re¬ 
ceives  a  message  and  automatically  unpacks  it.  On  the  other  hand,  gl-Wait  simply 
receives  a  message  and  returns  the  PVM  message  value.  After  calling  gl-Wait,  ei¬ 
ther  gl_recv  orgl-unpack  might  be  used  to  unpack  the  message.  In  such  away, 
gl-Wait  can  wait  for  the  reception  of  more  than  one  message  (msgtag  —  1). 
For  completeness,  Glenda  also  offers  gl-pack,  an  operation  which  is  compatible  with 
other  PVM  functions. 

4  Glenda  and  Ocean  Circulation  Models 

This  document  introduces  Glenda,  a  new  parallel  processing  environment  which  is  mod¬ 
eled  after  the  Linda  language  and  utilizes  the  PVM  software  to  provide  underlying  com¬ 
munications.  The  long-term  objectives  of  this  project  are  to  investigate  and  evaluate  the 
feasibility  of  parallel  processing  for  ocean  circulation  applications. 

Ocean  circulation  models  usually  represent  an  ocean  domain  as  a  3-D  matrix  of  data, 
each  cell  of  the  matrix  consisting  of  about  5  double  precision  values.  It  makes  sense 
to  divide  the  basin  into  subdomains  and  allocate  one  CPU  for  each  subdivision  of  the 
original  matrix.  With  this  allocation,  each  CPU  computes  new  values  for  its  matrix  and 
then  propagates  the  boundary  values  to  its  neighbors.  Then  the  CPUs  would  repeat  this 
process  of  calculating  and  propagating  values  until  the  simulation  is  completed. 

It  is  clear  that  the  amount  of  data  to  be  propagated  depends  upon  the  coarseness 
of  the  grid  and  the  number  of  CPUs  used  for  the  simulation.  Let’s  assume  that  our  grid 
consists  of  an  by  iV  square  region  to  be  processed  by  CPUs.  In  the  figure  below 
we  have  a  100x100  grid  to  be  processed  by  4^  =  16  CPUs.  Each  CPU  is  responsible  for 


computing  new  data  for  a  25  by  25  sub-matrix. 


In  the  example,  the  boundary  data  is  shown  with  dotted  lines.  There  are  3  (P  —  1 , 
in  general)  horizontal  divisions  with  2  rows  of  data  to  be  passed  (one  row  goes  up  and 
the  other  goes  down).  Each  row  is  width  100  (N,  in  generai).  This  yieids  a  totai  of 
2(P  —  i)A^  ceiis  which  must  be  passed  up  or  down  in  the  matrix  for  one  iteration  of 
the  simulation.  The  same  is  true  for  the  vertical  divisions,  and  the  total  of  all  cells  to  be 
passed  in  any  direction  for  one  iteration  is  4{P  —  1)A^. 

Let’s  also  assume  that  there  are  D  bytes  of  data  per  cell  to  propagate  after  each  step 
of  the  simulation  and  that  the  data  transfer  rate  in  the  system  is  R  bytes  per  second.  The 
communication  time  for  one  iteration  is  estimated  to  be: 

4(P  -  :)ND 
R 

For  a  collection  of  networked  workstations  on  an  Ethernet,  R  is  about  100,000  bytes  per 
second  of  reliable  communication.  This  is  clearly  not  a  large  value  for  R,  but  networks 
are  commonly  busy,  so  this  is  a  fair  estimate. 

For  a  data  matrix  of  about  350x350  horizontal  grid  points  and  20  vertical  collocation 


points,  and  5  double  precision  updating  variables,  the  communication  time  estimate  is: 


4(P  -  1)350  •  20  •  5  •  8 
100000 


=  11.2  *  (P  -  1) 


Let’s  assume  that  the  total  CPU  time  for  one  iteration  is  64  seconds.  If  new  processors 
are  added,  the  communication  time  is  increased,  while  reducing  the  computation  time  for 
one  iteration.  This  imposes  a  limit  to  the  possible  speedup  in  this  environment.  With 
our  Ethernet  example,  the  limit  is  reached  with  4  processors  (P  =  2).  The  computation 
time  estimate  is  about  1 6  seconds  for  modern  workstations,  and  the  communication  time 
estimate  is  11.2  seconds.  Adding  more  CPUs  would  not  improve  the  efficiency  of  the 
system. 

It  is  apparent  that  Ethernet  speed  is  not  a  good  match  for  modern  CPUs  performing 
ocean  modeling.  In  the  case  of  more  tightly-coupled  CPUs,  as  in  a  multiprocessor  ma¬ 
chine,  the  communications  are  internal  and  R  will  be  reliably  over  10  times  as  fast.  The 
value  R  *  10^  bytes  per  second  yields  T  =  1.12  *  (P  —  1).  With  16  CPUs  (P  =  4), 
the  communication  time  is  P  =  3.36  seconds  while  the  computation  time  drops  to  4  sec¬ 
onds.  Using  more  CPUs  would  cause  the  communication  time  to  exceed  the  cpmputation 
time. 

We  realize  that  for  full  efficiency  in  a  parallel  environment  it  is  necessary  to  formulate 
numerical  schemes  that  are  most  suitable  for  the  new  technology.  In  this  respect,  implicit 
schemes,  or  models  formulated  with  the  rigid  lid  approximation  are  not  optimal.  The 
associated  algorithms  are  connected  with  the  inversion  of  large  matrices,  requiring  the 
simultaneous  solution  of  algebraic  equations.  Explicit  schemes  or  free  surface  models 
have  a  more  direct  applicability  to  parallel  programming,  because  their  formulation  is  very 
regular  and  requires  knowledge  of  variables  at  only  a  few  adjacent  points. 

A  work,  currently  in  progress,  is  configuring  an  explicit,  free  surface,  barotropic  ocean 
circulation  model  to  parallel  processing  environments,  and  verifying  the  portability  of  the 
model,  taking  advantage  of  a  distributing  computing  and/or  moderately  parallel  computer 
system.  These  findings  will  be  presented  in  a  subsequent  document. 
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A  Installation  and  use  guidelines 

A.l  How  to  obtain  Glenda 

To  obtain  a  copy  of  the  Glenda  software,  e-mail  your  request  to 

seyf arthSwhale . st . usm . edu 

and  a  copy  will  be  sent  as  soon  as  possible.  In  the  future,  an  anonymous  ftp  server 
may  be  set  up  to  facilitate  the  distribution  of  the  software. 


A. 2  Installation 

If  you  received  the  .  tar  version,  uudecode  glenda.tar.Z.uue  by  typing: 

>  uudecode  glenda.tar.Z.uue 

then  decompress  glenda.tar  by  typing: 

>  uncompress  glenda.tar.Z 

and,  finally  type: 

>  tar  -xf  glenda.tar 

If  you  received  the  ,  shar  version,  you  have  received  about  6  e-mail  messages 
which  should  be  saved  into  separate  files.  Each  file  should  be  edited  to  remove  the 
header  lines  as  instructed  within  the  files  themselves.  Then,  each  file  is  used  as  input  for 
the  execution  of  the  sh  command  as  in: 


>  sh  glenda.shar . 1 

>  sh  glenda.shar .2 


Either  of  the  distribution  methods  create  Glenda  directories  within  the  current  direc¬ 
tory.  The  directories  created  are  as  follows. 

•  glcnda /  is  the  top  level  directory.  It  contains  the  Glenda  source  code,  a  Make¬ 
file,  and  additional  subdirectories.  The  subdirectories  are: 

—  cgpp  /  contains  the  C-Glenda  preprocessor  code 
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—  ts/  contains  the  tuple  server  code 

—  include/ contains  the  include  files  forts/  and  examples/ files 

—  doc/ contains  documentation 

—  examples /  contains  examples  of  Glenda  programs. 

It  is  essential  to  specify  the  architecture  of  system  (ex:  ARCH=SUN4)  You  might 
edit  the  Makefile  in  the  glenda/,  ts/.  and  examples/  directories  to  prevent 
compiling  for  the  wrong  architecture. 

The  Makefiles  expect  the  environmental  variable  PVM.ROOT  to  be  defined  as  the  root 
directory  of  the  PVMS.x  software  and  expect  to  write  into  the  $P\/M_ROOT/bin/$ARCH 
directory. 

SGI  machines  require  the  linking  option  -Isun  to  access  the  XDR  routines.  This 
must  be  added  inside  the  ts/Makef  ile. 

This  Glenda  version  is  written  for  PVM  3.x.  However,  it  would  need  little  effort  to 
connect  to  PVM  2.4.  There  is  a  file,  pvmold .  c,  which  is  nearly  complete  for  providing 
PVM  3.x  function  calls,  using  the  PVM  2.4  library.  Unfortunately,  pvmold .  c  does  not 
have  an  equivalent  version  of  the  pvm.task  function,  which  is  used  by  the  tuple  server 
and  gl_user .  c  program  to  determine  the  tuple  server  task  id  number. 

B  C- Glenda  Preprocessor 

The  C-Glenda  preprocessor  converts  the  source  file  containing  the  Glenda  functions  into 
a  .  c  file,  capable  of  being  compiled  by  any  C  compiler.  The  C-Glenda  preprocessor  is 
capable  also  of  detecting  syntax  errors  and  specifying  the  type  of  the  errors.  Usage  is  as 
follows: 

>  cgpp  filename.cg 
(the  source  code  file  must  end  in  .  eg). 
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B.l  Makefile  Sample 

The  following  sample  clarifies  how  to  compile  the  Glenda  programs.  Make  sure  that  there 
is  a  directory  bin/  in  the  home  directory  and  that  the  environmental  variable  ARCH  is 
properly  defined. 


ARCH  =  RS6K 

PVMBIN  =  $(PVM_ROOT)/bin/$(ARCH) 

PVMINCLUDE  =  -I$(PVM_R00T) /include  -I.. /include 

CC  =  c89 

CFLAGS  =  -g  $ (PVMINCLUDE) 

PVMLIB  =  $(PVM_ROOT)/lib/$(ARCH) 

LIB  =  -L$(PVMLIB)  -IpvmS  -Im 

USER  =  . ./gts/gluser .0 

#  This  part  converts  a  file  to  a  "'.o”  file. 

#  The  default  .SUFFIXES  parameter  had  to  be  changed  to 

#  accomplish  this. 

#  The  -mv  command  can  be  removed  at  your  convenience. 

. SUFFIXES : 

.SUFFIXES:  .0  .eg  .c  .f  .y  .1  .s 


-cgpp  $*.cg 

-$(CC)  $ (CFLAGS)  -c  $*.c 
-mv  $* . c  $*.x 


.  cg.o : 


#  Place  each  master  file  and  its  corresponding  worker  file  here. 

all:  $(PVMBIN)/a  $(PVMBIN)/b 

$(PVMBIN)/a:  a.o 

$(CC)  -0  $(PVMBIN)/a  a.o  $(USER)  $(LIB) 
chmod  go+rx  $(PVMBIN)/a 

$(PVMBIN)/b:  b.o 

$(CC)  -0  $(PVMBIN)/b  b.o  $(USER)  $(LIB) 
chmod  go+rx  $(PVMBIN)/b 

clean: 

rm  -f  * .  0 

C  Tuple  Server 

Before  invoking  the  tupie  server,  PVM  3.x.  must  be  invoked  with  the  proper  configuration 
parameters.  Then,  simply  type: 

>  gts 

at  the  prompt,  and  the  tuple  server  is  automatically  placed  in  the  background  where  it 
continually  tries  to  receive  tuples.  At  this  point,  the  master  process  is  ready  to  start. 

It  is  important  to  not  invoke  the  tuple  server  without  invoking  PVM  and  the  master 
process  without  invoking  the  tuple  server,  first.  A  typical  sequence  of  commands  is  as 
follows: 


>  pvmd  pvmhosts  & 

>  gts 

>  master .filename 
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D  Glenda  Program  Sample 

This  is  an  exampie  of  a  Gienda  code.  The  master  fiie  is  a .  eg  and  the  worker  file  is 
b .  eg.  In  this  example,  the  process,  a,  outputs  an  array  to  multiple  processes,  b,  and 
waits  for  each  process’s  response, 
a.cg 


#ineliide  <stdio.h> 

#inelude  <glenda.h> 

main(arge,argv) 
int  arge; 
ehar  *argv[]  ; 

int  my_tid,  a; 
int  Size,  N,  i; 
int  ♦Data; 
int  j ,  Kids ; 
int  kid,  step; 

my.tid  =  gl.mytidO  ; 

if  (  arge  >  1  )  Size  =  atoi(argv[l] ) ; 
else  Size  =  100000; 

if  (  arge  >  2  )  N  =  atoi(argv [2] ) ; 
else  N  =  10; 

if  (  arge  >  3  )  Kids  =  atoiCargvCS] ) ; 
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else  Kids  =  10;, 


gl.out  (  "Size",  Size  ); 
gl.out  (  "N",  N  ); 

for  (  i  =  0;  i  <  Kids;  i++  )  { 
gl_spawn  (  "b"  ) ; 
gl.out  (  "Kid",  i  ); 


Data  =  (int  *)  malloc  (  Size  *  sizeof(iiit)  ); 

for  (  j  =  0;  j  <  N;  j++  )  { 

printfC'Step  %d  of  “/odXn" ,  j+1,  N  ); 
for  (  i  =  0;  i  <  Kids;  i++  ) 

gl.out  (  "data",  i.  Data: Size  ); 
for  (  i  =  0;  i  <  Kids;  i++  )  { 

gl.in  (  "OK",  ?  kid,  ?  step  ); 

printfC'Got  OK  from  Id  for  step  7.d\n"  ,kid, step+1) ; 

} 


gl.in  (  "Size",  Size  ); 
gl.in  (  "N",  N  ); 

gl.exitO  ; 
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#include  <stdio.h> 
#include  <glenda.h> 


main(argc,argv) 
int  argc; 
char  *argv[]  ; 

{ 

int  my_tid,  a; 
int  Size,  N,  i; 
int  *Data; 
int  k; 

my_tid  =  gl_mytid() ; 

gl_rd  (  "Size",  ?  Size  ); 
gl.rd  (  "N",  ?  N  ); 
gl.in  (  "Kid",  ?  k  ); 

fprintf (stderr,"Kid  %d.  Size  %d,  N  %d\n" ,k,Size ,N) ; 

Data  =  (int  ♦)  malloc  (  Size  *  sizeof(int)  ); 

for(i=0;i<N;  i++  )  { 

gl.in  (  "data",  k,  ?  Data: Size  ); 
gl.out  (  "OK",  k,  i  ); 

} 

gl.exitO ; 

} 


The  files  a .  eg  and  b .  eg  are  located  in  the  directory  glenda/ examples/. 
Another  exampie  includes  the  files  mm .  e  and  mmworker .  e  which  are  the  master 
and  worker  programs,  written  in  PVM,  for  the  execution  of  matrix  multplications.  The 
files,  created  by  Josef  Fritscher  (Technical  University  of  Vienna)  were  acquired  from  the 
newsgroup  comp . parallel . pvm. 

Thefiiesmmgl .  eg  andmmgl-Worker .  eg  are  the  corresponding  Glenda  ver¬ 
sions.  The  fiies  mmto  .  eg  and  mmto_worker .  eg  are  also  Glenda  versions  of 
mm .  c  and  mmworker .  c,  but  they  make  use  of  the  gl.outto  and  gl_into 
operations. 

E  Getting  Help 

Please  communicate  problems  and  bug  reports  to 

seyf arthSwhale . st . usm . edu 

It  would  be  helpful  to  describe  your  virtual  machine  configuration  (hardware,  PVM  ver¬ 
sion),  include  a  short  segment  of  code  illustrating  the  problem,  and  describe  how  it  fails. 


Good  luck  with  your  work ! 
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